Method for Visualizing the Directional Sound Activity of a Multichannel Audio Signal

ABSTRACT

A method for visualizing a directional sound activity of a multichannel audio signal, wherein the multichannel audio signal comprises time-dependent input audio signals, comprising determining a directional sound activity vector from virtual sound sources determined from an active directional vector and a reactive directional vector determined from time-frequency representations of different input audio signals for each one of a plurality of time-frequency tiles; determining a contribution of each one of said directional sound activity vectors within sub-divisions of space on the basis of directivity information related to each sub-divisions of space and directional sound activity level within said sub-division of space by summing said contributions; displaying a visualization of the directional sound activity of the multichannel audio signal by a graphical representation of directional sound activity level within said sub-division of space.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority under 35 U.S.C 119(a) to EuropeanPatent Application No. 16306190.6 filed on Sep. 19, 2016, all of whichis hereby expressly incorporated by reference into the presentapplication.

BACKGROUND OF THE INVENTION

The invention relates to a method and apparatus for visualizing thedirectional sound activity of a multichannel audio signal.

Audio is an important medium for conveying any kind of information,especially sound direction information. Indeed, the human auditorysystem is more effective than the visual system for surveillance tasks.Thanks to the development of multichannel audio format, spatializationhas become a common feature in all domains of audio: movies, videogames, virtual reality, music, etc. For instance, when playing a FirstPerson Shooting (FPS) game using a multichannel sound system (5.1 or 7.1surround sound), it is possible to localize enemies thanks to theirsounds.

Typically, such sounds are mixed onto multiple audio channels, whereineach channel is fed to a dedicated loudspeaker. Distribution of a soundto the different channels is adapted to the configuration of thededicated playback system (positions of the loudspeakers), so as toreproduce the intended directionality of said sound.

Multichannel audio streams thus require to be played back over suitableloudspeaker layouts. For instance, each of the channels of afive-channel formatted audio signal is associated with its correspondingloudspeaker within a five-loudspeaker array. FIG. 1 shows an example ofa five-channel loudspeaker layout recommended by the InternationalTelecommunication Union (ITU), with a left loudspeaker L, rightloudspeaker R, center loudspeaker C, surround left loudspeaker LS andsurround right loudspeaker RS, arranged around a reference listeningpoint O which is the recommended listener's position O. With thisreference listening point O as a center, the relative angular distancesbetween the central directions of the loudspeakers are indicated.

A multichannel audio signal is thus encoded according to an audio fileformat dedicated to a prescribed spatial configuration whereloudspeakers are arranged at prescribed positions to a referencelistening point. Indeed, each time-dependent input audio signal of themultichannel audio signal is associated with a channel, each channelcorresponding to a prescribed position of a loudspeaker.

If multichannel audio is played back over an appropriate sound system,i.e. with the required number of loudspeakers and correct angulardistances between them, a normal hearing listener is able to detect thelocation of the sound sources that compose the multichannel audio mix.However, should the sound system exhibit inappropriate features, such astoo few loudspeakers, or an inaccurate angular distance thereof, thedirectional information of the audio content may not be deliveredproperly to the listener. This is especially the case when sound isplayed back over headphones.

As a consequence, there is in this case a loss of information since themultichannel audio signal conveys sound direction information throughthe respective sound levels of the channels, but such information cannotbe delivered to the user. Accordingly, there is a need for conveying tothe user the sound direction information encoded in the multichannelaudio signal.

Some methods have been provided for conveying directional informationrelated to sound through the visual modality. However, these methodswere often a mere juxtaposition of volume meters, each dedicated to aparticular loudspeaker, and thus unable to render precisely thesimultaneous predominant direction of the sounds that compose themultichannel audio mix except in the case of one unique virtual soundsource whose direction coincides with a loudspeaker direction. Othermethods intended to more precisely display sound locations are socomplicated that they reveal themselves inadequate since sounddirections cannot be readily derived by a user.

For example, U.S. patent application US 2009/0182564 describes a methodwherein sound power level of each channel is displayed, or alternativelywherein position and power level of elementary sound components aredisplayed.

U.S. Pat. No. 9,232,337 B2 describes a method for visualizing adirectional sound activity of a multichannel audio signal that displaysa visualization of a directional sound activity of the multichannelaudio signal through a graphical representation of directional soundactivity level within a sub-division of space. For a channel and for afrequency sub-band, a sound activity vector is formed by associating thesound activity level corresponding to the frequency-domain signal ofsaid channel and said sub-band to the unit vector corresponding to thespatial information associated with said channel. In an embodiment ofthis patent, the energy vector sum representative for the perceiveddirectional energy is directly calculated using Gerzon's energy vectors,as a mere summation of the sound activity vectors related to thechannels for said frequency sub-band. This directional sound activityvector represents the predominant sound direction that would beperceived by a listener according to the recommended loudspeaker layoutfor sounds within that particular frequency sub-band.

However, if this method visually renders the main sound direction, itmay not always achieve optimal results for a user. Indeed, this methoddoes not exploit diffuse sounds, but focuses on identifying anddisplaying the main sound directions, regardless of the nature of thesound (directivity or diffuseness). As a result, when the sound is verydiffuse, it may not be able to correctly extract a useful main sounddirection from the noisy environment.

SUMMARY OF THE INVENTION

The method and system according to the invention is intended to providea simple and clear visualization of sound activity in any direction.

In accordance with a first aspect of the present invention, this objectis achieved by a method for visualizing a directional sound activity ofa multichannel audio signal, wherein the multichannel audio signalcomprises time-dependent input audio signals, each time-dependent inputaudio signal being associated with an input channel, spatial informationwith respect to a reference listening point being associated with eachone of said channel, the method comprising:

-   -   receiving the time-dependent input audio signals;    -   performing a time-frequency conversion of said time-dependent        input audio signals for converting each one of the        time-dependent input audio signals into a plurality of        time-frequency representations for the input channel associated        with said time-dependent input audio signal, each time-frequency        representation corresponding to a time-frequency tile defined by        a time frame and a frequency sub-band, the time-frequency tiles        being the same for the different input channels;    -   for each time-frequency tile, determining positions of at least        two virtual sound sources with respect to the reference        listening point and frequency signal values for each virtual        sound source from an active directional vector and a reactive        directional vector determined from time-frequency        representations of different input audio signals for said        time-frequency tile, wherein the active directional vector is        determined from a real part of a complex intensity vector and        the reactive directional vector is determined from an imaginary        part of the complex intensity vector;    -   for each time-frequency tile, determining a directional sound        activity vector from the virtual sound sources,    -   determining a contribution of each one of said directional sound        activity vectors within sub-divisions of space on the basis of        directivity information related to each sub-divisions of space;    -   for each sub-division of space, determining directional sound        activity level within said sub-division of space by summing said        contributions within said sub-division of space;    -   displaying a visualization of the directional sound activity of        the multichannel audio signal by a graphical representation of        directional sound activity level within said sub-division of        space.

Other preferred, although non-limitative, aspects of the pixel circuitof the Invention are as follows, isolated or in a technically feasiblecombination:

-   -   the active directional vector of a time-frequency tile is        representative of the sound energy flow at the reference        listening point for the time frame and a frequency sub-band of        said time-frequency tile, and wherein the reactive directional        vector is representative of acoustic perturbations at the        reference listening point with respect to the sound energy flow;

each input channel is associated with a sound direction defined betweenthe reference listening point and the prescribed position of the speakerassociated with said input channel, and a sound velocity vector isdetermined as a function of a sum of each sound direction weighted bythe time-frequency representation corresponding to the input channelassociated with said sound direction, said sound velocity vector beingused to determine the active directional vector and the reactivedirectional vector;

-   -   a sound pressure value defined by a sum of the time-frequency        representations of the different input channels is used to        determine the active directional vector and the reactive        directional vector;    -   the complex intensity vector results from a complex product        between a conjugate of a sound pressure value for a        time-frequency tile and a sound velocity vector for said        time-frequency tile;    -   for determining time-frequency signal values of each one of the        virtual sound sources, virtual microphone signals are        determined, each virtual microphone signal being associated with        a virtual sound source and corresponding to the signal that        would acquire a virtual microphone arranged at the reference        listening point and oriented in the direction toward the        position of said virtual sound source;    -   the time-frequency signal value of a virtual sound source is        determined by suppressing, in the virtual microphone signal        associated with said virtual sound source, the interferences        from other virtual sound sources;    -   the virtual sound sources are arranged on a circle centered on        the reference listening point and a virtual microphone signal        corresponds to the signal that would acquire a virtual cardioid        microphone having an cardioid directivity pattern in the shape        of a cardioid tangential to the circle centered on the reference        listening point;    -   there are three virtual sound sources for each time-frequency        tile, each virtual sound source having a position with respect        to the reference listening point, wherein:        -   a position of a first virtual sound source defines with the            reference listening point a direction which is collinear to            the direction of the active directional vector from the            reference listening point,        -   a position of a second virtual sound source defines with the            reference listening point a direction which is collinear to            the direction of the reactive directional vector with a            first orientation,        -   a position of a third virtual sound source defines with the            reference listening point a direction which is collinear to            the direction of the reactive directional vector with a            second orientation opposite to the first orientation; there            are two virtual sound sources for each time-frequency tile,            each virtual sound source having a position with respect to            the reference listening point, and wherein:        -   a position of a first virtual sound source defines with the            reference listening point a direction resulting from the sum            of the active directional vector and the reactive            directional vector weighted by a positive factor, and        -   a position of a second virtual sound source defines with the            reference listening point a direction resulting from the sum            of the active directional vector and the reactive            directional vector weighted by a negative factor;    -   information used for determining the contribution of a        directional sound activity vector within a sub-division of space        is an angular distance between a direction associated with said        sub-division of space and the direction of said directional        sound activity vector;    -   the contribution of a directional sound activity vector within a        sub-division of space is determined by weighting a norm of said        directional sound activity vector on the basis of an angular        distance between a direction associated with said sub-division        of space and the direction of said directional sound activity        vector;    -   norms of the directional sound activity vectors are further        weighted based on their respective frequency sub-bands;    -   at least two set of directional sound activity vectors        determined from the same input audio channels are weighted based        on their respective frequency sub-bands in accordance with two        different set of weighting parameters, and the two resulting        directional sound activities are displayed on the graphical        representation;    -   the visualization of the directional sound activity of the        multichannel audio signal comprises representations of said        sub-division of space, each provided with a representation of        the directional sound activity associated with said        sub-division.

The invention also relates to a non-transitory tangiblecomputer-readable medium having computer executable instructionsembodied thereon that, when executed by a computer, perform the methodaccording to the invention.

The invention also relates to an apparatus for visualizing directionalsound activity of a multichannel audio signal, comprising:

-   -   an input for receiving time-dependent input audio signals for a        plurality of input channels,        -   a processor and a memory for:            -   performing a time-frequency conversion of said                time-dependent input audio signals for converting each                one of the time-dependent input audio signals into a                plurality of time-frequency representations for the                input channel associated with said time-dependent input                audio signal, each time-frequency representation                corresponding to a time-frequency tile defined by a time                frame and a frequency sub-band, the time-frequency tiles                being the same for the different input channels,            -   for each time-frequency tile, determining an active                directional vector and a reactive directional vector                from time-frequency representations of different input                channels for said time-frequency tile, wherein the                active directional vector is determined from a real part                of a complex intensity vector and the reactive                directional vector is determined from an imaginary part                of the complex intensity vector,            -   for each time-frequency tile, determining positions of                virtual sound sources with respect to the reference                listening point in a virtual spatial configuration from                the active directional vector and the reactive                directional vector, and determining time-frequency                signal values for each virtual sound sources,        -   for each time-frequency tile, determining a directional            sound activity vector from the virtual sound sources,        -   determining a contribution of each one of said directional            sound activity vectors within sub-divisions of space on the            basis of directivity information related to each            sub-divisions of space,        -   for each sub-division of space, determining directional            sound activity data within said sub-division of space by            summing said contributions within said sub-division of            space,    -   a visualizing unit for displaying a visualization of the        directional sound activity of the multichannel audio signal.

BRIEF DESCRIPTION OF THE DRAWINGS

Other aspects, objects and advantages of the present invention willbecome better apparent upon reading the following detailed descriptionof preferred embodiments thereof, given as a non-limiting example, andmade with reference to the appended drawings wherein:

FIG. 1, already discussed, shows an example of prescribed positions ofloudspeakers with respect to a reference listening point in a prescribedspatial configuration for multichannel audio system;

FIG. 2 is a diagram showing steps of the method;

FIG. 3 is a diagram showing stages of the signal processing in themethod;

FIG. 4 shows schematically an example of a relationship between theactive directional vector and the reactive directional vector with thelocations of virtual sound sources;

FIG. 5 shows schematically an example of a virtual spatial configurationwith two virtual sound sources, and the active directional vector andthe reactive directional vector, and the cardioids of the twocorresponding virtual microphones;

FIG. 6 shows schematically an example of a virtual spatial configurationwith three virtual sound sources and the cardioids of the threecorresponding virtual microphones, as well as the active directionalvector and the reactive directional vector;

FIG. 7 illustrates a display layout according to an embodiment of thepresent invention.

DETAILED DESCRIPTION OF THE INVENTION

The operation of a directional sound activity analyzing unit, which maybe part of a device comprising a processor, typically a computer,further provided with means for acquiring audio signals and means fordisplaying a visualization of sound activity data, for example visualdisplay unit such as a screen or a computer monitor. The directionalsound activity analyzing unit comprises means for executing thedescribed method, such as a processor or any computing device, and amemory for buffering signals or storing various process parameters.

The directional sound activity analyzing unit receives an input signalconstituted by a multichannel audio signal. This multichannel audiosignal comprises K time-dependent input audio signals associated with Kinput audio channels, each time-dependent input audio signal beingassociated with an input channel. Each channel is associated withspatial information. Spatial information describes the location of theassociated loudspeaker relative to the listener's location, called thereference listening point. For example, spatial information can becoordinates or angles and distances used to locate a loudspeaker withrespect to the reference listening point, generally a listener'srecommended location. Typically, three values per audio channel areprovided to describe this localization. Spatial parameters constitutingsaid spatial information may then be represented by a K×3 matrix.

An input receives the multichannel audio signal comprisingtime-dependent input audio signals for a plurality of input channels(step S01). Each time-dependent input audio signal is associated with aninput channel. Each input channel corresponds to a prescribed positionof an electroacoustic transducer with respect to a reference listeningpoint in a prescribed spatial configuration. For example, in theprescribed spatial configuration shown by FIG. 1, there are five inputchannels, one for each loudspeaker LS, L, C, R, RS.

Under the plane-wave model assumption, the position of a sound source(e.g. the location of each loudspeaker) may be defined solely by thedirection of the sound source with respect to the reference listeningpoint. A unitary vector is then sufficient to locate a sound source.Accordingly, each of the prescribed positions defines a unitary vector{right arrow over (a_(l))} representing the sound direction andoriginating from the reference listening point and pointing in thedirection of each loudspeaker. As a result, each input channel i isassociated with a sound direction {right arrow over (a_(l))} definedbetween the reference listening point and the prescribed position of theloudspeaker associated with said input channel i. For example, in theprescribed spatial configuration shown in FIG. 1, the location of theloudspeaker C is defined by the sound vector {right arrow over (a_(C))}that originates from the reference listening point O and towards thelocation of the loudspeaker C on the unitary circle. This sound vector{right arrow over (a_(C))} extends in the front of the listening point.In a similar way, the location of the loudspeaker L is defined by thesound vector {right arrow over (a_(L))} that originates from thereference listening point O and towards the location of the loudspeakerL on the unitary circle. In this example, the directions of the soundvector {right arrow over (a_(C))} and of the sound vector {right arrowover (a_(L))} are at an angle of 30°.

The directional sound activity analyzing unit receives these input audiochannels, and then determines directional sound activity levels to bedisplayed for visualizing the directional sound activity of amultichannel audio signal. The directional sound activity analyzing unitis configured to perform the steps of the above-described method. Themethod is performed on an extracted part of the input signalcorresponding to a temporal window. For example, a 50 ms durationanalysis window can be chosen for analyzing the directional soundactivity within said window.

Frequency Analysis

First, a frequency band analysis aims at estimating the sound activitylevel for a predetermined number of frequency sub-bands for each channelof the windowed multichannel audio signal.

The received time-dependent input audio signals a_(i)(t) may be analog,but they preferably are digital signals. There are as many input audiosignals a_(i)(t) as input channels i. During the frequency analysis(step S10), the time-dependent input audio signals a_(i)(t) areconverted into the frequency domain by performing a time-frequencyconversion (step S02). Typically, the time-frequency conversion uses aFourier-related transform such as a Short-time Fourier transform (STFT),which is used to determine the sinusoidal frequency and phase content oflocal sections of a signal as it changes over time.

More precisely, each time-dependent input audio signal a_(i)(t) isconverted into a plurality of time-frequency representations A_(i)(k, n)for the input channel i associated with said time-dependent input audiosignal. Each time-frequency representation A_(i)(k, n) corresponds to atime-frequency tile defined by a time frame and a frequency sub-band.The conversion is made on a frame-by-frame basis.

Preferably, the frame length is comprised between 5 ms and 80 ms.Preferably, the width of the frequency sub-band is comprised between 10Hz and 200 Hz. Preferably the inter-frame spacing is comprised between1/16^(th) and one half of the frame length. For instance, for a samplingrate of 48 kHz and an FFT-based STFT processing framework, the framelength may be of 1024 samples with a related frequency sub-band width(or bin width) of 46.875 Hz and an inter-frame spacing of 512 samples.The time-frequency tiles are the same for the different input channelsi.

The frequency sub-bands are subdivisions of the frequency band of theaudio signal, which can be divided into sub-bands of equal widths orpreferably into sub-bands whose widths are dependent on human hearingsensitivity to the frequencies of said sub-bands.

In the following, k is used as a frequency index of a frequency sub-bandand n is a frame index, so that the time-frequency representationA_(i)(k,n) refers to a complex number associated with the k^(th)frequency sub-band and the n^(th) frame of the signal of the inputchannel i. The time-frequency representations A_(i) (k, n) and the sounddirections {right arrow over (a_(l))} are then used in a time-frequencyprocessing (step S03) wherein the data of a time-frequency tile areprocessed.

Spatial Analysis

Spatial analysis (step S11) is performed from time-frequencyrepresentations A_(i)(k, n) and the sound directions {right arrow over(a_(l))} of a time-frequency tile. For each time-frequency tile, anactive directional vector {right arrow over (D_(a))}(k, n) and areactive directional vector D_(r)(k, n) are determined (step S31) fromtime-frequency representations A_(i)(k, n) of different input channelsfor said time-frequency tile.

The active directional vector {right arrow over (D_(a))}(k, n) of atime-frequency tile is proportional to the active acoustical intensityvector which is representative of the sound energy flow at the referencelistening point for the time frame and the frequency sub-band of saidtime-frequency tile. More specifically, the active directional vector{right arrow over (D_(a))}(k,n) corresponds to the active acousticalintensity vector, normalized by the sum of the acoustic energiesE_(P)(k, n) and E_(K)(k, n) at the reference listening point O, with anadded minus sign in order to have it directed from the referencelistening point O towards the unitary circle. It is possible to use adifferent normalization or to omit the minus sign, in which case thevectors would be pointing towards the reference listening point O.

The reactive directional vector {right arrow over (D_(r))}(k, n) isproportional to the reactive acoustical intensity vector which isrepresentative of acoustic perturbations at the reference listeningpoint with respect to the sound energy flow for the same time-frequencytile. More specifically, the reactive directional vector {right arrowover (D_(r))}(k, n) corresponds to the reactive acoustical intensityvector, normalized by the sum of the acoustic energies E_(P)(k, n) andE_(K)(k, n) at the reference listening point O. A minus sign is alsoadded but could be omitted. As for the active directional vector, it ispossible to use a different normalization.

From a perceptual point of view, if the active directional vector {rightarrow over (D_(a))}(k, n) can be related to the primary directionalsound field, the reactive directional vector {right arrow over(D_(r))}(k, n) is related to the ambient diffuse sound field. Moreover,the directional information of the reactive directional vector {rightarrow over (D_(r))}(k, n) enables the handling of the spatialcharacteristics of this ambient sound field, and thus it can be used todescribe not only totally diffused ambient sound fields but alsopartially diffused ones.

This new approach is by nature more robust as it takes benefits of thereliability of the active directional vector {right arrow over(D_(a))}(k,n) which is a true acoustical spatial cue (compared to theGerzon vectors which are empiric perceptual cues), but also exploits thediffuseness of sound through the reactive directional vector {rightarrow over (D_(r))}(k, n).

It has been found that the combination of the active directional vector{right arrow over (D_(a))}(k, n) and the reactive directional vector{right arrow over (D_(r))}(k, n) may be used to identify the locationsof sound sources, as depicted with the example on FIG. 4. In this FIG.4, sound distribution is represented by two virtual sound sources VS1and VS2 arranged on a unitary circle centered on the reference listeningpoint O. The active directional vector {right arrow over (D_(a))}(k, n)originates from the reference listening point O and is directed alongthe main acoustical flow. In this example, the two uncorrelated soundsources VS1, VS2 are of equal energy (for that time-frequency tile). Asa result, the perceived acoustical energy flow at the referencelistening point O comes from the middle of the two sound sources VS1,VS2, and therefore the active directional vector {right arrow over(D_(a))}(k,n) extends between the two sound sources VS1, VS2. Thereactive directional vector {right arrow over (D_(r))}(k, n) is hereperpendicular to the active directional vector {right arrow over(D_(a))}(k,n), and the location of a sound source VS1, VS2 correspondsto the sum of the active directional vector {right arrow over(D_(a))}(k,n) and of the reactive directional vector {right arrow over(D_(r))}(k, n) or of the opposite of the reactive directional vector{right arrow over (D_(r))}(k, n).

However, most of the time, the sound sources VS1, VS2 are not totallyuncorrelated. It has been found that whatever the exact locations of thetwo sound sources VS1, VS2, the reactive intensity is maximal when thesource signals are totally uncorrelated. Conversely, the reactiveintensity is minimal when the source signals are totally correlated. Ina similar way, where the sound source signals are totally uncorrelated,the reactive intensity is maximal when the source directions arespatially negatively correlated (i.e. opposite) with respect to thereference listening point O. Conversely, the reactive intensity isminimal when the source directions are spatially correlated (i.e. in thesame direction) with respect to the reference listening point O.

For determining the active directional vector {right arrow over(D_(a))}(k,n) and the reactive directional vector {right arrow over(D_(r))}(k, n), the prescribed positions of the loudspeakers withrespect to the reference listening point O in a prescribed spatialconfiguration are used. As indicated above, each input channel i isassociated with a sound direction {right arrow over (a_(l))} definedbetween the reference listening point O and the prescribed position ofthe loudspeaker associated with said input channel i.

A sound pressure value P(k, n) for a time-frequency tile defined by asum of the time-frequency representations A_(i)(k, n) of the differentinput channels of the same for said time-frequency tile is determined:

${P\left( {k,n} \right)} = {\sum\limits_{i}^{\;}{A_{i}\left( {k,n} \right)}}$

A sound velocity vector {right arrow over (V)}(k, n) for thetime-frequency tile is determined, said sound velocity vector {rightarrow over (V)}(k, n) being proportional to a sum of each sounddirection {right arrow over (a_(l))} weighted by the time-frequencyrepresentation A_(i)(k,n)corresponding to the input channel i associatedwith said sound direction {right arrow over (a_(l))}:

${\overset{->}{V}\left( {k,n} \right)} = {{{- \frac{1}{\rho \; c}}{\sum\limits_{i}^{\;}{{A_{i}\left( {k,n} \right)}\overset{}{a_{i}}}}} = \begin{pmatrix}{V_{x}\left( {k,n} \right)} \\{V_{y}\left( {k,n} \right)} \\{V_{z}\left( {k,n} \right)}\end{pmatrix}}$ where $\left\{ \begin{matrix}{{V_{x}\left( {k,n} \right)} = {{- \frac{1}{\rho \; c}}{\sum\limits_{i}^{\;}{{A_{i}\left( {k,n} \right)}{\overset{}{a_{i}} \cdot \overset{->}{e_{x}}}}}}} \\{{V_{y}\left( {k,n} \right)} = {{- \frac{1}{\rho \; c}}{\sum\limits_{i}^{\;}{{A_{i}\left( {k,n} \right)}{\overset{}{a_{i}} \cdot \overset{->}{e_{y}}}}}}} \\{{V_{z}\left( {k,n} \right)} = {{- \frac{1}{\rho \; c}}{\sum\limits_{i}^{\;}{{A_{i}\left( {k,n} \right)}{\overset{}{a_{i}} \cdot \overset{->}{e_{z}}}}}}}\end{matrix} \right.$

with {right arrow over (e_(x))}, {right arrow over (e_(y))}and {rightarrow over (e_(z))} the unitary vectors of a coordinate system used as areference frame for the virtual spatial configuration, ρthe density ofair and c the speed of sound. For example, the speed of sound in dry airat 20° C. is 343.2 meters per second, which may be approximated to 340m.s⁻¹. At sea level and at 15° C., air density is approximately 1.225kg/m³, which may be approximated to 1.2 kg/m³. Other values may be used.

A complex intensity vector {right arrow over (I)}(k, n) resulting from acomplex product between a conjugate of the sound pressure value P (k, n)for a time-frequency tile and the sound velocity vector {right arrowover (V)}(k, n) for said time-frequency tile is determined:

{right arrow over (I)}(k, n)=P(k, n)*{right arrow over (V)}(k, n)

and is used to determine the active directional vector {right arrow over(D_(a))}(k, n) and the reactive directional vector {right arrow over(D_(r))}(k, n) of said time-frequency tile. More precisely, the activedirectional vector {right arrow over (D_(a))}(k, n) is determined fromthe real part of the complex product {right arrow over (I)}(k, n) andthe reactive directional vector {right arrow over (D_(r))}(k, n) isdetermined from the imaginary part of the complex product {right arrowover (I)}(k, n).

The active directional vector {right arrow over (D_(a))}(k, n) and thereactive directional vector {right arrow over (D_(r))}(k, n) may becalculated as follows:

$\mspace{20mu} \left\{ {\begin{matrix}{{\overset{}{D_{a}}\left( {k,n} \right)} = {- {\Re\left( \frac{\overset{->}{I}\left( {k,n} \right)}{c\left( {{E_{P}\left( {k,n} \right)} + {E_{K}\left( {k,n} \right)}} \right)} \right)}}} \\{{\overset{}{D_{r}}\left( {k,n} \right)} = {- {\left( \frac{\overset{->}{I}\left( {k,n} \right)}{c\left( {{E_{P}\left( {k,n} \right)} + {E_{K}\left( {k,n} \right)}} \right)} \right)}}}\end{matrix}\mspace{20mu} {where}\left\{ \begin{matrix}{{E_{K}\left( {k,n} \right)} = {{\frac{\rho}{2}{\overset{->}{V}\left( {k,n} \right)}^{H}{\overset{->}{V}\left( {k,n} \right)}} = {\frac{\rho}{2}\left( {{{V_{x}\left( {k,n} \right)}}^{2} + {{V_{y}\left( {k,n} \right)}}^{2} + {{V_{z}\left( {k,n} \right)}}^{2}} \right)}}} \\{{E_{P}\left( {k,n} \right)} = {{\frac{1}{2\rho \; c^{2}}{P\left( {k,n} \right)}*{P\left( {k,n} \right)}} = {\frac{1}{2\rho \; c^{2}}{{P\left( {k,n} \right)}}^{2}}}}\end{matrix} \right.} \right.$

It shall be noted that the active directional vector {right arrow over(D_(a))}(k, n) and the reactive directional vector {right arrow over(D_(r))}(k, n) are here normalized by the energies E_(K)(k, n) andE_(P)(k,n), but could be calculated otherwise. It shall be noted thatthat a minus sign is added to the expressions of the active directionalvector {right arrow over (D_(a))}(k, n) and reactive directional vector{right arrow over (D_(r))}(k, n) in order to have them directed from thereference listening point O towards the unitary circle. It would bepossible to omit the minus sign, in which case the vectors would bepointing towards the reference listening point O.

Once the active directional vector {right arrow over (D_(a))}(k, n), thereactive directional vector {right arrow over (D_(r))}(k, n), the soundpressure value P(k, n) and the sound velocity vector {right arrow over(V)}(k, n) (or the equivalents thereof) have been determined, it ispossible to perform the audio source extraction (step S12) fordetermining positions and time-frequency signal values of virtual soundsources (step S32).

Audio Source Extraction

The method requires determining the attributes (position andtime-frequency signal values) of virtual sound sources that will be usedthereafter to determine the signals of the electroacoustic transducersof the actual spatial configuration.

For each time-frequency tile, the active directional vector {right arrowover (D_(a))}(k, n) and the reactive directional vector {right arrowover (D_(r))}(k, n) are used to determine the positions of the virtualsound sources with respect to the reference listening point in a virtualspatial configuration (step S32).

The determined positions of the virtual sound sources, the activedirectional vector {right arrow over (D_(a))}(k,n), the reactivedirectional vector {right arrow over (D_(r))}(k, n), the sound pressurevalue P(k, n) and the sound velocity vector {right arrow over (V)}(k,n)are used to determine virtual first-order directional microphone signals(step S122) corresponding to the sounds that would be acquired byvirtual microphones arranged at the reference listening point O anddirected towards each virtual sound sources. There are as many virtualmicrophones as virtual sound sources.

A virtual microphone signal is a function of the sum of the soundpressure value P(k, n), and of the scalar product between the soundvelocity vector {right arrow over (V)}(k, n) and a unitary vector in thedirection of a sound source, possibly weighted by the density of air Pand the speed of sound c . For example, a virtual cardioid microphonesignal M_(j)(k, n) associated with a virtual sound source arranged inthe direction defined by {right arrow over (s_(j))}(k,n) can becalculated as follows:

${M_{j}\left( {k,n} \right)} = \frac{{P\left( {k,n} \right)} + {\rho \; c{{\overset{->}{V}\left( {k,n} \right)} \cdot {\overset{}{s_{J}}\left( {k,n} \right)}}}}{2}$

A virtual microphone signal highlights the sound of the correspondingvirtual sound source perceived at the reference listening point O, butalso contains interferences from the other virtual sound sources.However, defining the virtual microphone signals for every virtual soundsource allows identifying the virtual sound source signal of eachvirtual sound source.

It shall be noted that spatial manipulation may be performed bymodifying the positions of the virtual sound sources. This approach ismuch safer than modifying the input channel data side defining theprescribed positions, because the original primary/ambient energy ratiois kept.

The details of the source extraction process however change depending onthe number of virtual sound sources. The audio source extraction processestimates the locations and frequency signal values of virtual soundsources that generate the same sound field characteristics as the soundfield defined by the time-dependent input audio signals in theprescribed configuration. Source-related sound field models need to bedefined, as the audio source extraction process may be highly differentfrom one model to another. Two reliable models with the analysis basedon the exploitation of both the active and reactive components of theacoustical intensity are described below: a model with two virtual soundsources and a model with three virtual sound sources.

The “two-source” model handles the diffuseness (and thus makes use ofthe reactive component) as an indicator of the perceptual width of asound source or local diffuseness. Two sound sources are sufficient tosimulate a wider sound source, their spatial and signal correlationdefining the perceived wideness of this composite sound source. The“three-source” model handles the diffuseness (and thus makes use of thereactive component) as an indicator of the ambience level within thesound scene or global diffuseness. Two uncorrelated sound sources ofopposite directions are suitable to simulate this ambient component, inaddition to a first virtual sound source corresponding to the primarycomponent. It is explained below how to proceed with two virtual soundsources or three virtual sound sources.

Source Extraction: Two Virtual Sound Sources

In a spatial configuration of a unitary circle centered on the referencelistening point O, the virtual sound sources are positioned on theunitary circle. A position of a virtual sound source is therefore at theintersection of the unitary circle with a directional line extendingfrom the reference listening point. The position of each virtual soundsource can be defined by a unitary source direction vector {right arrowover (s_(j))}(k, n) originating from the reference listening point. Thisis shown in FIG. 5.

As indicated above, the first step of the source extraction consists indetermining the positions of the two virtual sound sources (step S121).As shown in FIG. 5, each unitary source direction vector {right arrowover (s_(j))}(k, n) is defined through the active directional vector{right arrow over (D_(a))}(k, n) and reactive directional vector {rightarrow over (D_(r))}(k, n). More precisely, a virtual sound source islocated at the intersection of

-   -   the unitary circle and    -   a line collinear with the reactive directional vector {right        arrow over (D_(r))}(k, n) and passing through the tip of the        active directional vector {right arrow over (D_(a))}(k, n)        originating from the reference listening point.

If the analyzed sound field is generated by two uncorrelated soundsources (not necessary of equal energy), this technique enables toretrieve the exact location of those two sound sources. If the two soundsources used to generate the sound field tend to be in-phase(respectively opposite-phase), their exact locations cannot be retrievedanymore. The technique over-estimates (respectively under-estimates) thespatial correlation between the two sound source directions. However,this relationship between signal correlation and spatial correlation isperceptively coherent.

Determining the locations of the two virtual sound sources VS1, VS2 isequivalent to solving a geometry problem of the intersection of a linewith a circle (or a sphere for three-dimensional sound field). Solvingthis problem is equivalent to solving a second order equation, whichsolutions are

$\left\{ {\begin{matrix}{{\overset{}{s_{1}}\left( {k,n} \right)} = {{\overset{}{D_{a}}\left( {k,n} \right)} - {\frac{{\beta \left( {k,n} \right)} + \sqrt{\Delta \left( {k,n} \right)}}{2{\alpha \left( {k,n} \right)}}{\overset{}{D_{r}}\left( {k,n} \right)}}}} \\{{\overset{}{s_{2}}\left( {k,n} \right)} = {{\overset{}{D_{a}}\left( {k,n} \right)} - {\frac{{\beta \left( {k,n} \right)} - \sqrt{\Delta \left( {k,n} \right)}}{2{\alpha \left( {k,n} \right)}}{\overset{}{D_{r}}\left( {k,n} \right)}}}}\end{matrix}{with}\left\{ \begin{matrix}{{\alpha \left( {k,n} \right)} = {{\overset{}{D_{r}}\left( {k,n} \right)}}^{2}} \\{{\beta \left( {k,n} \right)} = {2{{\overset{}{D_{a}}\left( {k,n} \right)} \cdot {\overset{}{D_{r}}\left( {k,n} \right)}}}} \\{{\Delta \left( {k,n} \right)} = {{\beta \left( {k,n} \right)}^{2} - {4{\alpha \left( {k,n} \right)}\left( {{{\overset{}{D_{a}}\left( {k,n} \right)}}^{2} - 1} \right)}}}\end{matrix} \right.} \right.$

It shall be noted that there are:

-   -   a position of a first virtual sound source VS1 defines, with the        reference listening point O, a direction resulting from the sum        of the active directional vector {right arrow over        (D_(a))}(k, n) and the reactive directional vector {right arrow        over (D_(r))}(k, n) weighted by a positive factor, and    -   a position of a second virtual sound source VS2 defines, with        the reference listening point O, a direction resulting from the        sum of the active directional vector {right arrow over        (D_(a))}(k, n) and the reactive directional vector {right arrow        over (D_(r))}(k, n) weighted by a negative factor.

We thus have a source direction vector {right arrow over (s₁)}(k, n) ofa first virtual sound source VS1, and a source direction vector {rightarrow over (s₂)}(k, n) of a second virtual sound source VS2. As depictedin FIG. 5, these source direction vectors {right arrow over (s₁)}(k, n),{right arrow over (s₂)}(k, n) localize the virtual sound sources VS1,VS2 on the unitary circle centered on the reference listening point O.

As explained above, after the computation of the directions of the twovirtual sound sources VS1, VS2, it is possible, by combining the soundpressure value P(k, n) and the sound velocity vector {right arrow over(V)}(k, n) to the source direction vectors {right arrow over (s₁)}(k,n), {right arrow over (s₂)}(k, n), to create two virtual directionalmicrophones. As depicted in FIG. 5, the two virtual directionalmicrophones may have a cardioid directivity patterns VM1, VM2 in thedirections of the source direction vectors {right arrow over (s₁)}(k,n), {right arrow over (s₂)}(k, n). The virtual microphone pick-up inthese two directions may then be estimated by virtual microphone signalsM₁(k, n), M₂(k, n) defined as follows:

$\left\{ {\begin{matrix}{{M_{1}\left( {k,n} \right)} = \frac{{P\left( {k,n} \right)} + {\rho \; c{{\overset{->}{V}\left( {k,n} \right)} \cdot {\overset{}{s_{1}}\left( {k,n} \right)}}}}{2}} \\{{M_{2}\left( {k,n} \right)} = \frac{{P\left( {k,n} \right)} + {\rho \; c{{\overset{->}{V}\left( {k,n} \right)} \cdot {\overset{}{s_{2}}\left( {k,n} \right)}}}}{2}}\end{matrix}\quad} \right.$

As explained above, each virtual microphone signal highlights the soundsignal of the corresponding virtual sound source VS1, VS2 perceived atthe reference listening point O, but also contains interferences fromthe other virtual sound source:

$\left\{ {{\begin{matrix}{{M_{1}\left( {k,n} \right)} = {{S_{1}\left( {k,n} \right)} + {{\mu \left( {k,n} \right)}{S_{2}\left( {k,n} \right)}}}} \\{{M_{2}\left( {k,n} \right)} = {{S_{2}\left( {k,n} \right)} + {{\mu \left( {k,n} \right)}{S_{1}\left( {k,n} \right)}}}}\end{matrix}{with}{\mu \left( {k,n} \right)}} = \frac{1 + {{\overset{}{s_{1}}\left( {k,n} \right)} \cdot {\overset{}{s_{2}}\left( {k,n} \right)}}}{2}} \right.$

where S₁ (k, n) is the time-frequency signal value of the first virtualsound source VS1 and S₂(k, n) is the time-frequency signal value of thesecond virtual sound source VS2. A last processing step permits toextract the time-frequency signal values S₁(k, n), S₂(k, n) of eachvirtual sound source by unmixing the source signals from the virtualmicrophone signals (step S123):

$\left\{ {\begin{matrix}{{S_{1}\left( {k,n} \right)} = \frac{{S_{sum}\left( {k,n} \right)} + {S_{diff}\left( {k,n} \right)}}{2}} \\{{S_{2}\left( {k,n} \right)} = \frac{{S_{sum}\left( {k,n} \right)} - {S_{diff}\left( {k,n} \right)}}{2}}\end{matrix}{with}\left\{ \begin{matrix}{{S_{sum}\left( {k,n} \right)} = \frac{{M_{1}\left( {k,n} \right)} + {M_{2}\left( {k,n} \right)}}{1 + {\mu \left( {k,n} \right)}}} \\{{S_{diff}\left( {k,n} \right)} = \frac{{M_{1}\left( {k,n} \right)} - {M_{2}\left( {k,n} \right)}}{1 - {\mu \left( {k,n} \right)}}}\end{matrix} \right.} \right.$

The positions of the two virtual sound sources VS1, VS2, defined by thesource direction vectors {right arrow over (s₁)}(k, n) and {right arrowover (s₂)}(k, n), and their respective time-frequency signal valuesS₁(k, n) and S₂(k, n) have been determined.

It shall be noted that the two virtual sound sources VS1, VS2 areequivalent, in the sense that they contain both primary component(through the active directional vector {right arrow over (D_(a))}(k, n))and ambient components (through the reactive directional vector {rightarrow over (D_(r))}(k, n)). An ambience extraction processing may beperformed for implementing additional refinement.

Audio Source Extraction: Three Virtual Sound Sources

As explained before, the first step of the audio source extractionconsists in determining the positions of the three virtual soundsources, through unitary source direction vectors {right arrow over(s_(j))}(k, n) defined by the active directional vector {right arrowover (D_(a))}(k,n) and reactive directional vector {right arrow over(D_(r))}(k, n). In a spatial configuration of a unitary circle centeredon the reference listening point O, the virtual sound sources arepositioned on the unitary circle. A position of a virtual sound sourceis therefore at the intersection of the unitary circle with adirectional line extending from the reference listening point. Theposition of each virtual sound source can be defined by a unitary sourcedirection vector {right arrow over (s_(j))}(k, n) originating from thereference listening point. The unitary source direction vector {rightarrow over (s_(j))}(k, n) is defined through the active directionalvector {right arrow over (D_(a))}(k, n) and reactive directional vector{right arrow over (D_(r))}(k, n). This is shown in FIG. 6.

As already explained, the active directional vector {right arrow over(D_(a))}(k, n) indicates the main perceptual sound event direction, thereactive intensity indicates the “direction of maximum perceptualdiffuseness”. Using three virtual sound sources VS1, VS2, VS3 thusappears relevant to approximate the sound field properties:

-   -   one virtual sound source VS1 is in the direction of the active        directional vector {right arrow over (D_(a))}(k, n) to represent        the reconstruction of the main acoustic flow, and    -   two virtual sound sources VS2, VS3 negative spatially        correlated, in the direction of the reactive directional vector        {right arrow over (D_(r))}(k, n) and its opposite direction,        respectively, to represent the acoustic perturbations of the        acoustic field.

As a consequence, there are:

-   -   a position of a first virtual sound source VS1 defines with the        reference listening point O a direction which is collinear to        the direction of the active directional vector {right arrow over        (D_(a))}(k, n) from the reference listening point,    -   a position of a second virtual sound source VS2 defines with the        reference listening point O a direction which is collinear to        the direction of the reactive directional vector {right arrow        over (D_(r))}(k, n) from the reference listening point with a        first orientation,    -   a position of a third virtual sound source VS3 defines with the        reference listening point a direction which is collinear to the        direction of the reactive directional vector {right arrow over        (D_(r))}(k, n) from the reference listening point O with a        second orientation opposite to the first orientation.

Indeed, determining the positions of the virtual sound sources VS1, VS2,VS3 is much simpler for the three-source model than for the two-sourcemodel, since their source direction vectors {right arrow over(s_(l))}(k, n) are directly computed from the active directional vector{right arrow over (D_(a))}(k, n) and the reactive directional vector{right arrow over (D_(r))}(k, n):

$\left\{ {\begin{matrix}{{\overset{}{s_{1}}\left( {k,n} \right)} = \frac{\overset{}{D_{a}}\left( {k,n} \right)}{{{\overset{}{D_{a}}\left( {k,n} \right)}}^{2}}} \\{{\overset{}{s_{2}}\left( {k,n} \right)} = \frac{\overset{}{D_{r}}\left( {k,n} \right)}{{{\overset{}{D_{r}}\left( {k,n} \right)}}^{2}}} \\{{\overset{}{s_{3}}\left( {k,n} \right)} = {- \frac{\overset{}{D_{r}}\left( {k,n} \right)}{{{\overset{}{D_{r}}\left( {k,n} \right)}}^{2}}}}\end{matrix}\quad} \right.$

with a first source direction vector {right arrow over (s₁)}(k, n) of afirst virtual sound source VS1, a second source direction vector {rightarrow over (s₂)}(k, n) of a second virtual sound source VS2, and a thirdsource direction vector {right arrow over (s₃)}(k, n) of a third virtualsound source VS3. As depicted in FIG. 5, these source direction vectorslocalize the virtual sound sources VS1, VS2, VS3 on the unitary circlecentered on the reference listening point O.

As explained above, after the computation of the directions of the threevirtual sound sources VS1, VS2, VS3, it is possible, by combining thesound pressure value P(k, n), the sound velocity vector {right arrowover (V)}(k, n) to a source direction vector, to create three virtualdirectional microphones. As depicted in FIG. 6, the three virtualdirectional microphones may have a cardioid directivity patterns VM1,VM2, VM3 in the directions of the source direction vectors {right arrowover (s₁)}(k, n){right arrow over (s₂)}(k, n), {right arrow over(s₃)}(k, n). The virtual microphone pick-ups in these three directionsmay then be estimated by virtual microphone signals defined as follows:

$\left\{ {\begin{matrix}{{M_{1}\left( {k,n} \right)} = \frac{{P\left( {k,n} \right)} + {\rho \; c{{\overset{->}{V}\left( {k,n} \right)} \cdot {\overset{}{s_{1}}\left( {k,n} \right)}}}}{2}} \\{{M_{2}\left( {k,n} \right)} = \frac{{P\left( {k,n} \right)} + {\rho \; c{{\overset{->}{V}\left( {k,n} \right)} \cdot {\overset{}{s_{2}}\left( {k,n} \right)}}}}{2}} \\{{M_{3}\left( {k,n} \right)} = \frac{{P\left( {k,n} \right)} + {\rho \; c{{\overset{->}{V}\left( {k,n} \right)} \cdot {\overset{}{s_{3}}\left( {k,n} \right)}}}}{2}}\end{matrix}\quad} \right.$

As explained above, each virtual microphone signal M₁(k, n) , M₂(k, n) ,M₃(k, n) highlights the sound of the corresponding virtual sound sourceVS1, VS2, VS3 perceived at the reference listening point O, but alsocontains interferences from the other virtual sound source VS1, VS2,VS3. More precisely, since the second source direction vector {rightarrow over (s₂)}(k, n) and the third source direction vector {rightarrow over (s₃)}(k, n) are of opposite direction, interference betweenthe second virtual sound source VS2 and the third virtual sound sourceVS3 is negligible, whereas they both interfere with the first virtualsound source VS1:

$\left\{ {\begin{matrix}{{M_{1}\left( {k,n} \right)} = {{S_{1}\left( {k,n} \right)} + {{\mu_{12}\left( {k,n} \right)}{S_{2}\left( {k,n} \right)}} + {{\mu_{13}\left( {k,n} \right)}{S_{3}\left( {k,n} \right)}}}} \\{{M_{2}\left( {k,n} \right)} = {{S_{2}\left( {k,n} \right)} + {{\mu_{12}\left( {k,n} \right)}{S_{1}\left( {k,n} \right)}}}} \\{{M_{3}\left( {k,n} \right)} = {{S_{3}\left( {k,n} \right)} + {{\mu_{13}\left( {k,n} \right)}{S_{1}\left( {k,n} \right)}}}}\end{matrix}{with}\left\{ \begin{matrix}{{\mu_{12}\left( {k,n} \right)} = {\frac{1 + {{\overset{}{s_{1}}\left( {k,n} \right)} \cdot {\overset{}{s_{2}}\left( {k,n} \right)}}}{2} = \frac{1 + {\frac{\overset{}{D_{a}\left( {k,n} \right)}}{{\overset{}{D_{a}}\left( {k,n} \right)}} \cdot \frac{\overset{}{D_{r}}\left( {k,n} \right)}{{\overset{}{D_{r}}\left( {k,n} \right)}}}}{2}}} \\{{\mu_{13}\left( {k,n} \right)} = {\frac{1 + {{\overset{}{s_{1}}\left( {k,n} \right)} \cdot {\overset{}{s_{3}}\left( {k,n} \right)}}}{2} = \frac{1 - {\frac{\overset{}{D_{a}}\left( {k,n} \right)}{{\overset{}{D_{a}}\left( {k,n} \right)}} \cdot \frac{\overset{}{D_{r}}\left( {k,n} \right)}{{\overset{}{D_{r}}\left( {k,n} \right)}}}}{2}}}\end{matrix} \right.} \right.$

A last processing step (step S123) permits to extract the time-frequencysignal value of each virtual sound source by unmixin the sourcetime-frequency values:

$\left\{ {\begin{matrix}{{S_{1}\left( {k,n} \right)} = \frac{{M_{1}\left( {k,n} \right)} - \left( {{{\mu_{12}\left( {k,n} \right)}{M_{2}\left( {k,n} \right)}} + {{\mu_{13}\left( {k,n} \right)}{M_{3}\left( {k,n} \right)}}} \right)}{1 - \left( {{\mu_{12}\left( {k,n} \right)}^{2} + {\mu_{13}\left( {k,n} \right)}^{2}} \right)}} \\{{S_{2}\left( {k,n} \right)} = {{M_{2}\left( {k,n} \right)} - {{\mu_{12}\left( {k,n} \right)}{S_{1}\left( {k,n} \right)}}}} \\{{S_{3}\left( {k,n} \right)} = {{M_{3}\left( {k,n} \right)} - {{\mu_{13}\left( {k,n} \right)}{S_{1}\left( {k,n} \right)}}}}\end{matrix}\quad} \right.$

Contrary to the model with two virtual sound sources, the three virtualsound sources are already decomposed between primary components andambient components:

-   -   the first virtual sound source VS1 corresponds to the primary        component, and    -   the second virtual sound source VS2 and third virtual sound        source VS3 correspond to the ambient components.

Directional Sound Activity Vector

Once the attributes of the virtual sound sources have been determined(positions and time-frequency signal values), it is possible todetermine a directional sound activity vector related to atime-frequency tile from the virtual sound sources. This directionalsound activity vector represents the predominant sound direction thatwould be perceived by a listener according to the recommendedloudspeaker layout for sounds within the particular frequency sub-bandof the time-frequency tile.

The attributes of the directional sound activity vector is calculatedfrom the positions and time-frequency signal values of the virtual soundsources.

With three virtual sound sources, the energy vectors relative to thesound sources of a time-frequency tile are:

$\left\{ {\begin{matrix}{{\overset{}{E_{1}}\left( {k,n} \right)} = {{{S_{1}\left( {k,n} \right)}}^{2}{\overset{}{s_{1}}\left( {k,n} \right)}}} \\{{\overset{}{E_{2}}\left( {k,n} \right)} = {{{S_{2}\left( {k,n} \right)}}^{2}{\overset{}{s_{2}}\left( {k,n} \right)}}} \\{{\overset{}{E_{3}}\left( {k,n} \right)} = {{{S_{3}\left( {k,n} \right)}}^{2}{\overset{}{s_{3}}\left( {k,n} \right)}}}\end{matrix}\quad} \right.$

The energy vector sum representative for the perceived directionalenergy is then:

${\overset{->}{E}\left( {k,n} \right)} = {\sum\limits_{j = 1}^{3}{\overset{}{E_{J}}\left( {k,n} \right)}}$

The first virtual sound source VS1 is more related to the mainperceptual sound event direction and the two other virtual sound sourcesVS2, VS3 more related to the direction of the maximum perceptualdiffuseness. It may be then relevant to take only the first virtualsound source VS1 for the directional sound activity vector. Generally, aweighting of the different virtual sound sources VS1, VS2, VS3 may beused, with a source-weighted directional sound activity vector expressedas:

${\overset{->}{E}\left( {k,n} \right)} = {\sum\limits_{j = 1}^{3}{\omega_{j}{\overset{}{E_{J}}\left( {k,n} \right)}}}$

with ω_(j) weighting factors between 0 and 1. Preferably, the sum of theweighting factors ω_(j) is 1. Preferably, none of the weighting factorsω_(j) is 0. Preferably, the weighting factors ω₂ and ω₃ are equal.Preferably, ω₁>ω₂, and ω₁>ω₃.

With two virtual sound sources, it is also possible to use a weightedsum of the energy vectors relative to the two sound sources as with thethree virtual sound sources. Preferably, the two weighting factors ω₁and ω₂ are equal.

Frequency Masking

An optional, however advantageous, frequency masking (step S13) canadapt directional sound activity vectors according to their respectivefrequency sub-bands. In order to tune reactivity with respect to soundfrequencies, the norms of the directional sound activity vectors can beweighted based on their respective frequency sub-bands. The weighteddirectional sound activity vector is then

{right arrow over (G)}[k, n]=∝[k]. {right arrow over (E)}[k,n]

where a[k] is a weight, for instance between 0 and 1, which depends onthe frequency sub-band of each directional sound activity vector. Such aweighting allows enhancing particular frequency sub-bands of particularinterest for the user. This feature can be used for discriminatingsounds based on their frequencies. For instance, frequencies related toparticularly interesting sounds can be enhanced in order to distinguishthem from ambient noise. The directional sound analyzing unit can be fedwith spectral sensitivity parameters which define the weight attributedto each frequency sub-band.

In order to directionally visualize sound activity, space is dividedinto sub-divisions which are intended to discretely represent theacoustic environment of the listener. FIG. 7 shows an example of such adivided space relative to a 5.1 loudspeaker layout. A polarrepresentation of the listener's environment is divided into M similarsub-divisions 6 circularly disposed around the reference listening pointin a central position representing the listener's location. Loudspeakersof the recommended layout of FIG. 1 are represented for comparison.

For each frequency sub-band, the dominant sound direction and the soundactivity level associated to said direction is now determined anddescribed by the directional sound activity vector, preferably weightedas described above. The visualization of such directional informationmust be very intuitive so that sound direction information can berestituted to the user without interfering with other source ofinformation.

The beam clustering stage (S14) corresponds to allocating to each of thesub-division a part of each frequency sub-band sound activity.

To this end the contributions of each frequency sub-band sound activityto each sub-division of space are determined on the basis of directivityinformation. For each sub-division of space, a directional soundactivity level is determined within said sub-division of space bycombining, for instance by summing, the contributions of said frequencysub-band sound activity to said sub-division of space.

Directivity information is associated to each sub-division 6. Suchdirectivity information relates to level modulation as a function ofdirection in an oriented coordinate system, typically centered on alistener's position. This directivity information can be described by adirectivity function which associates a weight to space directions in anoriented coordinate system. Typically, such a directivity functionexhibits a maximum for a direction associated with the relatedsub-division.

For each sub-division 6 of space, norms of directional sound activityvectors are weighted on the basis of a directivity informationassociated with said sub-division 6 of space and the directions of saiddirectional sound activity vectors. These weighted norms can thusrepresent the contribution of said directional sound activity vectorswithin said sub-divisions of space.

For instance, a directivity function can be parameterized by a beamvector {right arrow over (v_(m))} and an angular value θ_(m)corresponding to the angular width of the beam, wherein m identifies aspace sub-division. The direction associated with a sub-division 6 canbe the main direction defined by the beam vector {right arrow over(v_(m))}. Accordingly, the angular distance between a beam vector {rightarrow over (v_(m))} and a directional sound activity vector {right arrowover (G)}[k] can define the clustering weight C_(m)[k]. For instance, asimple directional weighting function may be 1 if the angular distancebetween a beam vector {right arrow over (v_(m))} and a directional soundactivity vector {right arrow over (G)}[k] is less than θ_(m)/2 and 0otherwise:

${C_{m}\lbrack k\rbrack} = \left\{ \begin{matrix}{{1\mspace{14mu} {if}\mspace{14mu} {{angle}\left( {\overset{}{v_{m}},{\overset{->}{G}\lbrack k\rbrack}} \right)}} \leq {\theta_{m}/2}} \\{{0\mspace{14mu} {if}\mspace{14mu} {{angle}\left( {\overset{}{v_{m}},{\overset{->}{G}\lbrack k\rbrack}} \right)}} > {\theta_{m}/2}}\end{matrix} \right.$

The beam vector {right arrow over (v_(m))} and the angular value θ_(m)used for define the parameters of the directivity function canconstitute an example of directivity information by which contributionof each one of said directional sound activity vectors withinsub-divisions of space can be estimated.

The directional sound activity within a beam or sub-division of spacecan then be determined by summing said contributions, such as weightednorms in this example, of said directional sound activity vectorsrelated to L frequency sub-bands:

$A_{m} = {\sum\limits_{k = 1}^{L}{{C_{m}\lbrack k\rbrack}{{\overset{->}{G}\lbrack k\rbrack}}}}$

Once determined, the directional sound activity for each of the M beamcan be fed to a visualizing unit, typically to a screen associated withthe computer which comprises or constitutes the directional soundanalyzing unit.

For every space sub-division 6, such as the beams illustrated in FIG. 7,directional sound activity can then be displayed for visualization (stepS04). A graphical representation of directional sound activity levelwithin said sub-division of space is displayed, as in FIG. 7. In thedisplayed graphical representation, sub-divisions of space are organizedaccording to their respective location within said space, so as toreconstruct the divided space.

FIG. 7 shows a configuration wherein the directional sound activity isrestricted in two different beams, suggesting that sound sources relatedto different frequencies are located in the directions related to thesetwo beams. It shall be noted that at least one beam 16 a shows adirectional sound activity without having a direction that correspondsto a loudspeaker recommended orientation. As can be seen, a user caneasily and accurately infer sound source directions, and thus canretrieve sound direction information originally conveyed by themultichannel audio input signal.

Other graphical representation can be used, such a radar chart whereindirectional sound activity levels are represented on axes starting fromthe center, lines or curves being drawn between the directional soundactivity levels of adjacent axes. Preferably, the lines or curves definea colored geometrical shape containing the center.

The invention thus allows sound direction information to be delivered tothe user even if said user does not possess the recommended loudspeakerlayout, for example with headphones. It can also be very helpful forhearing-impaired people or for users who must identify sound directionsquickly and accurately.

Preferably, the graphical representation shows several directional soundactivity levels for each sub-division, these directional sound activitylevels being calculated with different frequency masking parameters.

For example, at least two set of spectral sensitivity parameters arechosen to parameterize two frequency masking process respectively usedin two directional sound activity level determination processes. The twoset of directional sound activity vectors determined from the same inputaudio channels are weighted based on their respective frequencysub-bands in accordance with two different set of weighting parameters.Consequently, for each sub-division, each one of the two directionalsound activity levels enhanced some particular frequencies in order todistinguish different sound types.

The two directional sound activities can then be displayedsimultaneously within the same sub-divided space, for example with acolor code for distinguishing them and a superimposition, for instancebased on level differences.

The method of the present invention as described above can be realizedas a program and stored into a non-transitory tangible computer-readablemedium, such as CD-ROM, ROM, hard-disk, having computer executableinstructions embodied thereon that, when executed by a computer, performthe method according to the invention.

While the present invention has been described with respect to certainpreferred embodiments, it will be apparent to those skilled in the artthat various changes and modifications may be made without departingfrom the scope of the invention as defined in the appended claims.

What is claimed is:
 1. A method for visualizing a directional soundactivity of a multichannel audio signal, wherein the multichannel audiosignal comprises time-dependent input audio signals, each time-dependentinput audio signal being associated with an input channel, spatialinformation with respect to a reference listening point being associatedwith each one of said channel, the method comprising: receiving thetime-dependent input audio signals; performing a time-frequencyconversion of said time-dependent input audio signals for convertingeach one of the time-dependent input audio signals into a plurality oftime-frequency representations for the input channel associated withsaid time-dependent input audio signal, each time-frequencyrepresentation corresponding to a time-frequency tile defined by a timeframe and a frequency sub-band, the time-frequency tiles being the samefor the different input channels; for each time-frequency tile,determining positions of at least two virtual sound sources with respectto the reference listening point and frequency signal values for eachvirtual sound source from an active directional vector and a reactivedirectional vector determined from time-frequency representations ofdifferent input audio signals for said time-frequency tile, wherein theactive directional vector is determined from a real part of a complexintensity vector and the reactive directional vector is determined froman imaginary part of the complex intensity vector; for eachtime-frequency tile, determining a directional sound activity vectorfrom the virtual sound sources, determining a contribution of each oneof said directional sound activity vectors within sub-divisions of spaceon the basis of directivity information related to each sub-divisions ofspace; for each sub-division of space, determining directional soundactivity level within said sub-division of space by summing saidcontributions within said sub-division of space; displaying avisualization of the directional sound activity of the multichannelaudio signal by a graphical representation of directional sound activitylevel within said sub-division of space.
 2. The method of claim 1,wherein the active directional vector of a time-frequency tile isrepresentative of the sound energy flow at the reference listening pointfor the time frame and a frequency sub-band of said time-frequency tile,and wherein the reactive directional vector is representative ofacoustic perturbations at the reference listening point with respect tothe sound energy flow.
 3. The method according to claim 1, wherein eachinput channel is associated with a sound direction defined between thereference listening point and the prescribed position of the speakerassociated with said input channel, and a sound velocity vector isdetermined as a function of a sum of each sound direction weighted bythe time-frequency representation corresponding to the input channelassociated with said sound direction, said sound velocity vector beingused to determine the active directional vector and the reactivedirectional vector.
 4. The method according to claim 1, wherein a soundpressure value defined by a sum of the time-frequency representations ofthe different input channels is used to determine the active directionalvector and the reactive directional vector.
 5. The method according toclaim 1 wherein the complex intensity vector results from a complexproduct between a conjugate of a sound pressure value for atime-frequency tile and a sound velocity vector for said time-frequencytile.
 6. The method according to claim 1, wherein for determiningtime-frequency signal values of each one of the virtual sound sources,virtual microphone signals are determined, each virtual microphonesignal being associated with a virtual sound source and corresponding tothe signal that would acquire a virtual microphone arranged at thereference listening point and oriented in the direction toward theposition of said virtual sound source.
 7. The method according to claim6, wherein the time-frequency signal value of a virtual sound source isdetermined by suppressing, in the virtual microphone signal associatedwith said virtual sound source, the interferences from other virtualsound sources.
 8. The method according to claim 6, wherein the virtualsound sources are arranged on a circle centered on the referencelistening point and a virtual microphone signal corresponds to thesignal that would acquire a virtual cardioid microphone having ancardioid directivity pattern in the shape of a cardioid tangential tothe circle centered on the reference listening point.
 9. The methodaccording to claim 1, wherein there are three virtual sound sources foreach time-frequency tile, each virtual sound source having a positionwith respect to the reference listening point, wherein: a position of afirst virtual sound source defines with the reference listening point adirection which is collinear to the direction of the active directionalvector from the reference listening point, a position of a secondvirtual sound source defines with the reference listening point adirection which is collinear to the direction of the reactivedirectional vector with a first orientation, a position of a thirdvirtual sound source defines with the reference listening point adirection which is collinear to the direction of the reactivedirectional vector with a second orientation opposite to the firstorientation.
 10. The method according to claim 1, wherein there are twovirtual sound sources for each time-frequency tile, each virtual soundsource having a position with respect to the reference listening point,and wherein: a position of a first virtual sound source defines with thereference listening point a direction resulting from the sum of theactive directional vector and the reactive directional vector weightedby a positive factor, and a position of a second virtual sound sourcedefines with the reference listening point a direction resulting fromthe sum of the active directional vector and the reactive directionalvector weighted by a negative factor.
 11. The method of claim 1, whereindirectional information used for determining the contribution of adirectional sound activity vector within a sub-division of space is anangular distance between a direction associated with said sub-divisionof space and the direction of said directional sound activity vector.12. The method according to claim 1, wherein the contribution of adirectional sound activity vector within a sub-division of space isdetermined by weighting a norm of said directional sound activity vectoron the basis of an angular distance between a direction associated withsaid sub-division of space and the direction of said directional soundactivity vector.
 13. The method of claim 1, wherein norms of thedirectional sound activity vectors are further weighted based on theirrespective frequency sub-bands.
 14. The method of claim 13, wherein atleast two set of directional sound activity vectors determined from thesame input audio channels are weighted based on their respectivefrequency sub-bands in accordance with two different set of weightingparameters, and the two resulting directional sound activities aredisplayed on the graphical representation.
 15. The method of claim 1,wherein the visualization of the directional sound activity of themultichannel audio signal comprises representations of said sub-divisionof space, each provided with a representation of the directional soundactivity associated with said sub-division.
 16. A non-transitorytangible computer-readable medium having computer executableinstructions embodied thereon that, when executed by a computer, performthe method of any one of claims 1 to
 15. 17. An apparatus forvisualizing directional sound activity of a multichannel audio signal,comprising: an input for receiving time-dependent input audio signalsfor a plurality of input channels, a processor and a memory for:performing a time-frequency conversion of said time-dependent inputaudio signals for converting each one of the time-dependent input audiosignals into a plurality of time-frequency representations for the inputchannel associated with said time-dependent input audio signal, eachtime-frequency representation corresponding to a time-frequency tiledefined by a time frame and a frequency sub-band, the time-frequencytiles being the same for the different input channels, for eachtime-frequency tile, determining an active directional vector and areactive directional vector from time-frequency representations ofdifferent input channels for said time-frequency tile, wherein theactive directional vector is determined from a real part of a complexintensity vector and the reactive directional vector is determined froman imaginary part of the complex intensity vector, for eachtime-frequency tile, determining positions of at least two virtual soundsources with respect to the reference listening point in a virtualspatial configuration from the active directional vector and thereactive directional vector, and determining time-frequency signalvalues for each virtual sound sources, for each time-frequency tile,determining a directional sound activity vector from the virtual soundsources, determining a contribution of each one of said directionalsound activity vectors within sub-divisions of space on the basis ofdirectivity information related to each sub-divisions of space, for eachsub-division of space, determining directional sound activity datawithin said sub-division of space by summing said contributions withinsaid sub-division of space, a visualizing unit for displaying avisualization of the directional sound activity of the multichannelaudio signal.